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ABSTRACT 

We  present  an  approach  for  accurate  estimation  of  the  reconstruction 
distortion  in  SNR  scalable  video  coding  with  drift.  Based  on  a  linear 
model  of  predictive  video  coding,  we  derive  an  algorithm  to  quan¬ 
tify  spatio-temporal  drift  properties  subject  to  prediction  structure 
and  motion  information.  This  allows  for  low-complex  estimation 
of  the  reconstruction  distortion  on  a  per-block  basis.  The  accuracy 
of  the  distortion  estimation  is  experimentally  verified.  We  then  uti¬ 
lize  the  method  for  quality  layer  assignment  within  the  framework  of 
H.264/AVC  scalable  video  coding  (SVC),  which  is  currently  under 
standardization.  The  quality  layers  allow  for  bit  stream  truncation  in 
a  rate-distortion  optimized  sense.  Compared  to  the  quality  layer  as¬ 
signment  as  implemented  in  the  SVC  test  model,  use  of  backward 
drift  estimation  allows  for  achieving  equivalent  coding  efficiency 
with  reduced  complexity. 

Index  Terms —  Error  propagation,  hierarchical  B  pictures,  qual¬ 
ity  layers,  SVC,  H.264/AVC 

1.  INTRODUCTION 

Motion  compensated  temporal  filtering  (MCTF)  has  proven  to  pro¬ 
vide  a  robust  basis  for  highly  efficient  scalable  video  coding.  Hi¬ 
erarchical  temporal  prediction  based  on  B  pictures  can  be  seen  as 
a  specific  instantiation  of  MCTF.  It  is  a  fundamental  element  of 
H.264/AVC  scalable  video  coding  (SVC),  which  is  currently  jointly 
developed  by  ISO/IEC  MPEG  and  ITU-T  VCEG  [1,2],  The  hier¬ 
archical  B  prediction  stmcture  enables  effective  attenuation  of  error 
drift,  which  is  inevitably  caused  if  a  SVC  bit  stream  is  decoded  at 
a  bit  rate  lower  than  that  used  for  operating  the  prediction  loop  at 
the  encoder  [3],  The  propagation  of  an  encoder-decoder  mismatch 
at  a  particular  spatio-temporal  location  in  the  reconstruction  process 
is  strictly  related  to  the  prediction  correspondences,  and  is  therefore 
strongly  dependent  on  the  motion  information.  In  our  previous  work 
[4],  we  have  developed  a  linear  model  of  the  prediction  process  un¬ 
der  consideration  of  the  coding  control,  based  on  which  the  impact 
of  drift  can  be  estimated.  Using  a  heuristic  approach  to  quantify  the 
propagation  properties,  it  was  shown  that  consideration  of  the  poten¬ 
tially  remaining  drift  can  allow  for  optimization  of  the  bit  allocation 
during  encoding,  and  consequently  improved  coding  efficiency. 

In  this  paper,  we  propose  an  analytic  approach  for  quantifica¬ 
tion  of  the  spatio-temporal  error  propagation  properties  in  predictive 
video  coding.  Using  the  presented  backward  drift  estimation  algo¬ 
rithm,  accurate  error  propagation  correspondences,  considering  the 
exact  mode  information  and  sub-pel  accurate  motion  vectors,  can 
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be  derived  on  a  per-pixel  basis.  To  reduce  the  computational  com¬ 
plexity,  the  algorithm  is  generalized  to  derive  correspondences  on  a 
per-block  basis  or  a  per-picture  basis. 

While  in  existing  drift  estimation  approaches,  such  as  [5,  6], 
the  expected  distortion  is  tracked  subject  to  a-priori  knowledge  of 
quantization  errors  or  quantization  error  probabilities,  our  approach 
derives  generic  correspondences  between  spatio-temporal  regions, 
which  can  then  later  be  utilized  to  determine  the  impact  of  individ¬ 
ual  quantization  errors.  This  allows  for  accurate  rate-distortion  op¬ 
timized  bit  allocation.  Furthermore,  other  than  [5,  6],  our  approach 
is  capable  of  coping  with  error  correlations  caused  by  error  propa¬ 
gation  over  different  paths,  which  can  frequently  occur  within  hier¬ 
archical  B  prediction  structures.  Additionally,  sub-pel  interpolation 
can  be  accurately  considered. 

We  verify  the  accuracy  of  our  drift  estimation  method  by  predict¬ 
ing  the  reconstruction  distortion  based  on  the  per-pixel  quantization 
error.  It  is  then  applied  to  quality  layer  assignment  in  SVC.  Use  of 
quality  layers  allows  for  bit  stream  truncation  in  a  rate-distortion  op¬ 
timized  sense  [2,  7],  We  show  that  compared  to  the  quality  layer 
assignment  as  implemented  in  the  SVC  test  model,  use  of  back¬ 
ward  distortion  estimation  allows  for  achieving  equivalent  coding 
efficiency  with  reduced  complexity.  The  approach  is  similarly  ap¬ 
plicable  to  other  bit  allocation  problems  in  predictive  video  coding 
schemes  with  drift,  such  as  [4,  8], 

The  paper  is  organized  as  follows.  The  key  elements  of  the  in¬ 
vestigated  system  are  outlined  in  Sec.  2.  In  Sec.  3,  we  review  the 
formulation  of  our  linear  distortion  model  for  predictive  video  cod¬ 
ing  with  drift.  The  drift  estimation  algorithm  is  developed  in  Sec.  4. 
In  Sec.  5  we  provide  experimental  verification  of  our  approach,  and 
utilize  the  method  for  optimized  quality  layer  assignment  in  SVC. 
Sec.  6  concludes  the  paper. 

2.  INVESTIGATED  SYSTEM 
2.1.  Hierarchical  B  Pictures 

In  Fig.  la,  a  temporal  prediction  structure  with  hierarchical  B  pic¬ 
tures  and  T  —  3  levels  is  illustrated.  /*  denotes  the  picture  at  tem¬ 
poral  position  2  and  temporal  resolution  t.  Here,  t  =  0  corresponds 
to  the  coarsest  temporal  resolution,  and  t  =  T  —  1  corresponds  to 
the  finest  resolution.  The  arrows  indicate  the  prediction  dependen¬ 
cies,  e.g.  picture  /j2  is  predicted  by  a  bi-directional  motion  compen¬ 
sated  reference  picture  generated  from  pictures  /q  and  .  In  [1,2], 
flexible  reference  picture  positions  can  be  signaled  for  prediction. 
However,  the  SVC  test  model  implements  the  fundamental  dyadic 
decomposition  structure  according  to  Fig.  la,  where  a  picture  ft  is 
predicted  from  the  two  nearest  neighboring  pictures  associated  with 
a  temporal  resolution  less  than  t. 
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Fig.  1.  (a)  Hierarchical  B  prediction  structure  with  T  =  3  levels, 
(b)  Progressive  refinement  quantization  with  N  FGS  layers. 


2.2.  Progressive  Refinement  Quantization 

SNR  scalability  in  SVC  is  enabled  by  utilization  of  either  enhance¬ 
ment  slices  (CGS/MGS),  or  progressive  refinement  slices  (FGS)  [2]. 
The  approaches  differ  in  their  respective  quantization  and  coding 
schemes,  representing  different  trade-offs  between  complexity  and 
granularity  of  scalability.  While  our  presented  distortion  model  holds 
equally  for  either  scheme,  we  employ  FGS  coding  in  this  paper. 

The  basic  FGS  quantization  principle  is  illustrated  in  Fig.  lb. 
Here,  p"  denotes  the  unquantized  prediction  residual  information, 
0  <  n  <  N,  and  cn  denotes  the  quantized  transform  coefficients 
of  the  ?rth  FGS  layer,  where  n  =  0  corresponds  to  the  quality  base 
layer.  Note  that  when  multiple  prediction  loops  are  operated  at  the 
encoder,  the  input  signals  pn  may  be  different  from  each  other. 

The  forward  and  backward  spatial  transform  operations  are  de¬ 
noted  as  S  and  S~b,  respectively,  and  the  quantization  stage  as¬ 
sociated  with  layer  n  is  represented  by  Q".  For  each  FGS  layer, 
quantization  is  performed  on  the  refinement  information  relative  to 
the  preceding  layers.  To  allow  for  progressive  refinement,  the  quan¬ 
tizer  step  sizes  must  be  monotonically  decreasing  with  n.  In  SVC, 
the  step  size  is  halved  with  each  FGS  layer. 

SVC  enables  fine  granular  SNR  scalability  on  the  bit  stream 
level  by  truncation  of  network  abstraction  layer  units  containing  pro¬ 
gressive  refinement  information  (PR-NALUs).  To  allow  for  low- 
complex  rate-distortion  optimized  bit  stream  truncation,  the  relative 
importance  of  each  of  the  respective  NALUs  can  be  signaled  in  the 
NALU  headers.  This  concept  is  denoted  as  quality  layers  [7]. 

3.  MODEL-BASED  ESTIMATION  OF  THE 
RECONSTRUCTION  DISTORTION 

In  the  following,  we  review  the  linear  formulation  of  the  predic¬ 
tion  operations  introduced  in  [4.  8],  and  derive  a  linear  distortion 
model  for  an  SVC  codec.  While  throughout  this  paper  we  assume 
that  a  single  prediction  loop  is  operated  at  the  highest  FGS  layer  [3J, 
i.e.  pn  =  pN  =  p,  Vn,  the  model  can  be  generalized  for  the  case  of 
multiple  prediction  loops  according  to  [4J.  Fixed  motion  information 
is  assumed,  and  non-linear  effects  (rounding,  clipping,  deblocking) 
are  neglected.  We  denote  x  as  an  L  x  1  vector  comprising  the  L 
original  (unquantized)  samples  of  a  dependently  coded  subset  of  the 
input  video  sequence  (e.g.,  see  Fig.  la).  The  linear  prediction  oper¬ 
ations  are  expressed  as  follows. 

p  =  x  —  Mxenc  —  Ixbl  —  k  (1) 

Here,  the  unquantized  prediction  residual  is  represented  as  an  L  x 
1  vector  p.  The  L  x  L  matrices  M  and  I  express  the  motion- 
compensated  temporal  prediction  and  the  directional  intra  predic¬ 
tion,  respectively.  xenc,  xbl  and  k  are  L  x  1  vectors,  where  xenc 
denotes  the  prediction  reference  used  within  the  encoder  prediction 


loop,  and  xbl  represents  the  sequence  reconstructed  from  the  quality 
base  layer  without  FGS  refinement,  k  represents  a  static  prediction 
such  as  intra  DC  prediction.  Note  that  xbl  is  used  as  intra  prediction 
reference  [2]  since  it  is  guaranteed  to  be  available  to  the  decoder 
regardless  of  the  amount  of  FGS  refinement.  Thus,  any  intra  pre¬ 
diction  drift  is  avoided.  Assuming  the  samples  in  x  are  arranged  in 
macroblock  coding  order,  M  and  I  are  strictly  lower  triangular  ma¬ 
trices.  Moreover,  since  intra  prediction  is  constrained  [2],  Iij  =  0 
if  not  both  i  and  j  are  indices  of  intra  coded  pixels. 

The  spatial  forward  transform,  quantization,  and  backward  trans¬ 
form  processes  can  be  formulated  as 

c  =  Sp,  (2) 

p  =  S1  (c  +  q) 

=  S^c  +  S-V  (3) 

P  e 

Here,  S  and  S_1  are  L  x  L  matrices  expressing  the  spatial  forward 
and  backward  transform  operations,  respectively,  generating  the  L  x 
1  vector  of  transform  coefficients,  c.  Quantization  is  represented  by 
addition  of  an  L  x  1  random  vector  q,  which  depends  on  the  decoded 
bit  rate,  p  and  e  denote  the  reconstructed  residual  signal  and  the 
quantization  error  after  backward  transform,  respectively. 

The  reconstruction  is  generated  based  on  the  prediction  refer¬ 
ences  available  to  the  decoder.  From  Eq.  (3)  and  Eq.  (1),  it  follows 

x  —  p  +  Mx  +  Ixbl+k 

=  x  +  M  (x  —  xenc)  +  e.  (4) 


Decoding  the  full  bit  stream  including  all  FGS  layers  equivalents 
a  classical  drift-free  prediction  scheme  with  x  =  xenc  and 


For  reconstruction  with  quantization  error  edec,  substituting  Eq.  (5) 
into  Eq.  (4)  yields  [4] 


with  B  +  1 


edec  +  B  (edec  -  eenc)  ,  (6) 

(1  -  M)"1  .  (7) 


Here,  1  is  the  L  x  L  identity  matrix,  and  B  is  a  strictly  lower  trian¬ 
gular  L  x  L  matrix. 

Since  there  is  no  inter  prediction  within  a  picture,  we  observe 
that  Bij  =  0  if  both  i,  j  £  fz.  Furthermore,  we  assume  that  quanti¬ 
zation  errors  in  different  pictures  are  uncorrelated,  i.e.  E  [eiej]  =  0 
if  the  indices  i  and  j  do  not  belong  to  the  same  picture,  with  E  [•] 
denoting  the  expectation.  For  samples  within  a  given  picture  fz,  we 
further  assume  that 

L- 1 L- 1 

EE  E  (e?ec  -  e*nc)  (ef°  -  er)  E  Bk.iBkj]  =  °-  ^ 

i= 0  3=0  kefz 


This  is  reasonable  for  high  bit  rates,  where  quantization  errors  can 
be  assumed  to  be  uncorrelated.  It  is  also  reasonable  for  homoge¬ 
neous  full-pel  motion,  where  neighboring  quantization  errors  should 
not  interact  during  reconstruction,  i.e.  Bk,iBkj  =  0.  With  these 
assumptions,  we  can  formulate  the  expected  quadratic  distortion  of 
picture  fz  as  follows. 


e[db}  =  £[E  [xrc-x%)2] 

iefz 


E 


L  —  l  \ 

(efc)2+  E  Bjj  (edec  —  e®nc)2  (9) 

V  iti  J 


1-654 


It  can  be  seen  that  the  squared  matrix  elements  Bfj  determine  the 
expected  distortion  contribution  to  the  reconstruction  at  position  i, 
caused  by  a  drift  term  introduced  at  position  j.  Note  that  as  of 
the  generic  formulation  of  the  model,  the  elements  Bf  :l  can  reflect 
any  temporal  prediction  structure.  Particularly,  sub-pel  interpolation 
is  seamlessly  integrated,  and  for  hierarchical  prediction  structures. 
Fig.  la,  the  case  where  multiple  correspondences  over  different  paths 
exist  between  i  and  j  is  accurately  represented. 


4.  BACKWARD  DRIFT  ESTIMATION  ALGORITHM 


Considering  the  triangularity  of  M  and  B,  we  derive  an  algorithm  to 
determine  the  correspondence  factors  Bfj  as  follows.  From  Eq.  (7) 
it  can  be  shown  that  B  —  (B  +  1)  M .  Equivalently,  we  write 

t+i 

=  Mij  +  ^  BiikMk,j,  V*  >  j,  (10) 

k=L- 1 


such  that  Bij  =  Bij(— 1).  Note  that  the  sum  index  k  is  counted  in 
descending  order.  This  definition  is  convenient  in  the  derivation  of 
the  algorithm  below.  To  allow  for  both  pixel  based  and  block  based 
derivation  of  correspondence  factors,  we  formulate  the  expectation 
of  the  correspondence  between  two  blocks  I  and  J ,  with  A  =  |I|  = 
\J\  the  number  of  pixels  per  block. 


BSxj(l) 


Eiex  [Bij(l)} 

"J 

X  Mi'i  +  E  BXKMk,. 


jej 

1 

A 2 


1+1 


iei 

jej 


:=  Eiez[B;j(l)] 


iex 

jej 


jej k=L- 1 


i+i 


1  l  1+1 

]42  X  (  +  X  BSikMIj- P 

V  k=L- 1 


l+l 


fe+i 


(11) 


(12) 


2  ^  ]  Mk,.j  E\Bik  Mi,j  +  ^  ]  BitrnMmtj  ] 


initialize  Bij  =  0,  BSij  =  0,  VI,  J 
scan  sequence  in  reverse  coding  order,  V/C 
scan  pixels  k  £  1C  (direct  establishing) 
scan  j  £  J ,  Vf7,  such  that  Mk,j  ^  0 
Bkj  Bkj  + 

BSkj  <—  BSkj  + 
scan  VI,  such  that  BSik  ^  0 

scan  pixels  k  £  K,  (indirect  establishing) 

8  <—  Bik.Bzj  +  p\j ( BSik  —  B^K)  ( BSxj  —  B\j ) 
scanj  €  J ,  MJ,  such  that  Mkj  ^  0 
Bij  <—  Bij  +  j^BncMk,j 
BSij  <—  BSzj  +  \BSiK.Mk  j  +  -j Mk,j8 

The  video  sequence  is  scanned  in  backward  coding  order.  For  each 
motion  compensated  temporal  prediction  Mkj,  the  respective  ele¬ 
ments  Bicj,  BStcj  are  updated  (direct  establishing).  Furthermore, 
for  each  existing  correspondence  BSik.  originating  from  1C,  the  el¬ 
ements  Bij,  BSij  are  updated  (indirect  establishing).  For  A  =  1, 
the  algorithm  accurately  calculates  the  results  of  Eq.  (1 1)  -  Eq.  (13), 
with  B^j  =  BSij.  For  A  >  1,  BSzj  is  used  as  an  approximation 
for  B^j . 

The  computational  complexity  of  the  algorithm  depends  on  the 
number  of  establishing  steps  to  be  performed.  The  number  of  indi¬ 
rect  establishing  steps  depends  on  the  respective  number  of  existing 
correspondences  BSik.  which  can  be  roughly  expected  to  scale  with 
1/A.  For  picture  based  derivation,  the  number  of  correspondences 
equals  the  number  of  dependent  pictures.  For  a  T-level  B  prediction 
structure  with  T  >  2,  it  can  be  shown  that  the  average  number  of 
pictures  depending  on  a  B  picture  is  less  than  T  —  1.  Hence,  since 
each  B  picture  requires  one  direct  establishing  step,  at  most  T  estab¬ 
lishing  steps  are  performed  in  average.  For  a  given  k,  the  inner  loops 
over  j  for  Mkj  ^  0  represent  the  prediction  dependencies  including 
interpolated  pixels.  For  picture  based  derivation,  the  contributions  of 
the  individual  interpolation  taps  can  be  summed-up  instead  of  pro¬ 
cessing  each  tap  separately.  It  can  therefore  be  assumed  that  the 
complexity  of  each  establishing  step  is  c  <  1,  where  c  =  1  repre¬ 
sents  the  complexity  of  the  motion  compensation  operations  for  the 
respective  picture.  It  follows  that  the  total  complexity  of  the  algo¬ 
rithms  is  at  most  cT  times  the  complexity  of  the  motion  compensa¬ 
tion  operations  used  for  reconstruction  of  the  video  sequence. 

5.  EXPERIMENTAL  RESULTS 


Here,  BTJ  =  %(-l)  and  BSij  =  BSij{-  1).  With  Eq.  (10), 
we  write  the  last  term  in  Eq.  (12)  as 

2  i+1 

E  MkdE[BiikBid(k)] 

iex  k=L- 1 
jej 

2  Z+1  / 

=  IX  X  Mk,j  f  BxicBxj(k)  + 

^jejk=L- 1  ^ 

py/ ( BSik  -  B\K)  ( BSij(k )  -  B^(k))^  ,  (13) 

where  p  accounts  for  cross-correlations  between  the  elements  con¬ 
tributing  to  Bik  and  Bzj(k).  Finally,  from  Eq.  (11)  -  Eq.  (13),  we 
derive  the  following  algorithm. 


In  our  experiments,  we  use  the  SVC  test  model  JSVM-6  [3]  with  hi¬ 
erarchical  B  pictures.  We  use  eight  QCIF  15  Hz  test  sequences  with 
different  characteristics,  and  encode  with  1  =  5  temporal  levels  and 
N  =  2  FGS  layers. 

In  the  first  experiment,  we  verify  the  accuracy  of  the  distortion 
model  and  the  drift  estimation  algorithm  .  We  use  two  different  block 
sizes,  4x4  and  176  x  144  (picture  size),  to  quantify  the  drift  cor¬ 
respondences.  We  extract  each  bit  stream  at  1 1  equally  distributed 
target  bit  rates,  and  determine  the  quantization  errors,  edec.  eenc 
is  obtained  from  the  non-truncated  bit  stream.  We  then  use  Eq.  (9) 
to  estimate  the  reconstruction  distortion  for  each  picture,  and  based 
on  that,  we  calculate  the  estimated  mean  PSNR  over  each  of  the  se¬ 
quences.  We  thus  obtain  estimated  PSNR  measures  for  each  of  the 
extracted  bit  rates  without  performing  the  actual  reconstruction. 

Table  1  depicts  the  mean  and  maximum  absolute  differences  of 
the  estimated  measures  as  compared  to  the  true  PSNR  values  ob¬ 
tained  after  decoding,  for  different  values  of  p,  with  A  =  PSNRest  — 
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p 

4x4 
mean  |A| 

max  |  A  | 

P 

176  x  144 
mean  |A|  max  |A| 

0.0 

0.350 

1.392 

0.0 

0.376 

1.452 

0.2 

0.259 

1.091 

0.1 

0.315 

1.233 

0.4 

0.191 

0.725 

0.2 

0.267 

1.004 

0.6 

0.233 

0.526 

0.3 

0.233 

0.753 

0.8 

0.414 

0.899 

0.4 

0.245 

0.577 

1.0 

0.695 

1.347 

0.5 

0.305 

0.753 

Table  1.  Mean  and  maximum  absolute  PSNR  estimation  error  |dB], 


PSNRtrue.  It  can  be  seen  that  a  mean  absolute  estimation  error  of 
about  0.2  dB  over  all  tested  sequences  can  be  achieved.  Although 
the  minimal  achievable  estimation  errors  are  slightly  lower  for  the 
case  of  4  x  4  block  based  drift  estimation,  the  results  are  very  similar 
as  for  picture  based  estimation.  We  also  observed  similar  results  for 
pixel  based  derivation.  This  indicates  that  the  distortion  estimation 
error  is  primarily  caused  by  inaccurate  approximations  used  in  the 
distortion  model,  see  Sec.  3.  We  conclude  that  for  picture  based  op¬ 
timization  of  the  bit  allocation,  it  is  sufficiently  accurate  to  derive  the 
drift  correspondences  on  a  per-picture  basis.  Derivation  on  smaller 
block  bases  will  be  advantageous  for  more  localized  bit  allocation, 
such  as  in  [4,  8], 

We  now  use  our  distortion  estimation  technique  for  quality  layer 
assignment.  After  encoding  a  sequence,  the  backward  drift  estima¬ 
tion  algorithm  is  performed  on  a  per-picture  basis  with  p  =  0.3.  We 
then  extract  the  quantization  errors  eenc,  edec  corresponding  to  the 
base  layer  and  each  of  the  FGS  layers.  Based  on  that,  starting  with 
the  highest  FGS  layer,  we  use  the  distortion  model  to  compute  the 
estimated  PSNR.  Then  the  expected  decrease  A Dz  in  PSNR  is  cal¬ 
culated  for  the  least  significant  remaining  PR-NALU  of  each  picture 
/z,  with  A R,  the  bit  budget  of  that  NALU.  The  PR-NALU  with  the 
lowest  rate-distortion  slope  ADZ  /  ARZ  is  assigned  the  least  signif¬ 
icant  quality  layer.  Starting  from  the  now  estimated  PSNR,  the  al¬ 
gorithm  is  repeated  until  all  PR-NALUs  are  assigned  a  quality  layer. 
The  obtained  quality  information  is  possibly  merged  according  to 
[7],  such  that  the  maximum  number  of  quality  layers  defined  in  SVC 
is  not  exceeded. 

The  resulting  coding  performance  of  the  scalable  bit  stream  after 
quality  layer  assignment  is  exemplarily  depicted  in  Fig.  2  .  It  can  be 
seen  that  compared  to  the  quality  layer  assignment  in  JSVM-6,  our 
approach  provides  equivalent  coding  gain.  Furthermore,  while  the 
quality  layer  assignment  in  JSVM-6  requires  2NT  decoding  passes 
to  establish  the  required  rate-distortion  dependencies  [3],  our  al¬ 
gorithm  is  less  complex,  requiring  the  equivalent  of  cT  decoding 
passes,  with  c  <  1. 


6.  CONCLUSION 

We  have  presented  a  model-based  method  for  drift  estimation  in 
SNR  scalable  video  coding.  The  accuracy  of  the  prediction  has  been 
experimentally  verified.  The  method  has  then  been  utilized  for  qual¬ 
ity  layer  assignment  within  the  framework  of  H.264/AVC  scalable 
video  coding.  For  the  case  of  single-loop  FGS  coding,  compared 
to  the  quality  layer  assignment  method  in  the  SVC  test  model,  our 
approach  achieves  equivalent  coding  efficiency  with  reduced  com¬ 
putational  complexity. 

The  presented  distortion  estimation  approach  can  also  be  uti¬ 
lized  for  applications  requiring  consideration  of  spatially  localized 
drift  properties,  such  as  macroblock-based  bit  allocation.  As  of  its 
generality,  the  algorithm  can  principally  be  used  for  estimation  of 
error  propagation  in  any  lifting-based  MCTF  system. 


Fig.  2.  Simulation  results  for  FGS  coding  with  quality  layer  assign¬ 
ment  (QLA)  based  on  backward  drift  estimation  (BDE). 
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